Lab 5: Evaluatioin and Multi-Layer Preceptron

Cameron Matson

Zihao Mao

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import os

Data overview and Buisness Explanation

In this lab we aim to predict the position of an NBA player given some set of descriptors and statistics of their play. The original data set contains the player information and stats from every player that played in the NBA from 1950 to present.

Professional sports teams in general, and basketball teams in particular, have turned in recent years to data to gain an edge over their opponents. Part of the responsibilities of the team ownership and coaching staff are to assemble a team of players that give them best chance to win. Players on an NBA team have specific roles, largely based on the position that they play. It is, therefore, important that player play in the posision that most helps their team to win. However in todays NBA, positions are much more fluid and players can play positions that the classical baskeball system wouldn't expect (the Golden State Warriors of late.) We would like to create a classifier that would help NBA teams make decisions on which position players should play, which may or may not necessarilly be the position they have actually played. This could especially be the case for players coming from college, where teams generally run systems quite different from that employed in the NBA, or for teams looking to make a change, but are unable to trade for a new player. They might be able instead to reposition one of their current players

What our classifier will do, then is look at the stats and player details for each position and then, given a new set of stats and player make a probalistic estimation of which position that player plays. If the result diverges from the players listed position, this might be an indication that the player is not playing the correct position.

With respect to accuracy, its important to keep in mind that our classifier is meant to aid basketball personel decisions, not make them authoritatively. It's primary use is as a tool to compare a player to the performance and description of a players in the past, and report on the similarities the unkonwn player has to the different positions historically. So while we'd like to see as close to perfection in terms of prediction on splits of our dataset, what will ultimately be most useful to us and NBA teams is if the reported probability of a player playing any given position is significantly higher than that of the other possible positions.

Data inspection and Cleaning

In [2]:
# first lests load the datasets in

data_path = '../data/basketball'
players = pd.read_csv(os.path.join(data_path, 'players.csv'))
players.head()
Out[2]:
Unnamed: 0 Player height weight collage born birth_city birth_state
0 0 Curly Armstrong 180.0 77.0 Indiana University 1918.0 NaN NaN
1 1 Cliff Barker 188.0 83.0 University of Kentucky 1921.0 Yorktown Indiana
2 2 Leo Barnhorst 193.0 86.0 University of Notre Dame 1924.0 NaN NaN
3 3 Ed Bartels 196.0 88.0 North Carolina State University 1925.0 NaN NaN
4 4 Ralph Beard 178.0 79.0 University of Kentucky 1927.0 Hardinsburg Kentucky

All we want from this is the players name and their height (cm) and weight (kg).

In [3]:
players.drop(['Unnamed: 0', 'collage', 'birth_city', 'birth_state', 'born'], axis=1, inplace=True)
players.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3922 entries, 0 to 3921
Data columns (total 3 columns):
Player    3921 non-null object
height    3921 non-null float64
weight    3921 non-null float64
dtypes: float64(2), object(1)
memory usage: 92.0+ KB

Good. They're all non null, and seem to be the correct datatype.

Now let's load the players stats.

In [4]:
stats = pd.read_csv(os.path.join(data_path, 'seasons_stats.csv'))
stats.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24691 entries, 0 to 24690
Data columns (total 53 columns):
Unnamed: 0    24691 non-null int64
Year          24624 non-null float64
Player        24624 non-null object
Pos           24624 non-null object
Age           24616 non-null float64
Tm            24624 non-null object
G             24624 non-null float64
GS            18233 non-null float64
MP            24138 non-null float64
PER           24101 non-null float64
TS%           24538 non-null float64
3PAr          18839 non-null float64
FTr           24525 non-null float64
ORB%          20792 non-null float64
DRB%          20792 non-null float64
TRB%          21571 non-null float64
AST%          22555 non-null float64
STL%          20792 non-null float64
BLK%          20792 non-null float64
TOV%          19582 non-null float64
USG%          19640 non-null float64
blanl         0 non-null float64
OWS           24585 non-null float64
DWS           24585 non-null float64
WS            24585 non-null float64
WS/48         24101 non-null float64
blank2        0 non-null float64
OBPM          20797 non-null float64
DBPM          20797 non-null float64
BPM           20797 non-null float64
VORP          20797 non-null float64
FG            24624 non-null float64
FGA           24624 non-null float64
FG%           24525 non-null float64
3P            18927 non-null float64
3PA           18927 non-null float64
3P%           15416 non-null float64
2P            24624 non-null float64
2PA           24624 non-null float64
2P%           24496 non-null float64
eFG%          24525 non-null float64
FT            24624 non-null float64
FTA           24624 non-null float64
FT%           23766 non-null float64
ORB           20797 non-null float64
DRB           20797 non-null float64
TRB           24312 non-null float64
AST           24624 non-null float64
STL           20797 non-null float64
BLK           20797 non-null float64
TOV           19645 non-null float64
PF            24624 non-null float64
PTS           24624 non-null float64
dtypes: float64(49), int64(1), object(3)
memory usage: 10.0+ MB

There are a lot of fields here, and they're pretty inconsistently filled. Some of this arises from the fact that its such a long timeline. For example, in 1950, there was no such thing as a 3-pointer, so it wouldn't make sense for those players to have 3pt% stats.

Inspecting the dataset a little further, we notice that there is no stat for points per game (PPG). The total number of points scored is listed, but that is hard to compare across seasons where they played different games. To make the dataset more valid, i.e. to make the points column a valid comparisson measure, we'll only consider seasons in which they played the current full 82 game schedule. Which doesn't reduce the power of the dataset by that much, they moved to a 82 game season in 1967, and only the lockout shortened 1998-99 season didn't have a full scehdule.

Actually we might want to limit it to just seasons after 1980 when they introduced the 3 pointer. That should just make the prediction task easier, although we lose even more of the dataset. But if we consider the business case as being how to decide players posisitions TODAY it makes sense.

In [5]:
stats = stats[stats.Year >= 1980]
stats = stats[stats.Year != 1998]
stats.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 18380 entries, 5727 to 24690
Data columns (total 53 columns):
Unnamed: 0    18380 non-null int64
Year          18380 non-null float64
Player        18380 non-null object
Pos           18380 non-null object
Age           18380 non-null float64
Tm            18380 non-null object
G             18380 non-null float64
GS            17686 non-null float64
MP            18380 non-null float64
PER           18375 non-null float64
TS%           18307 non-null float64
3PAr          18295 non-null float64
FTr           18295 non-null float64
ORB%          18375 non-null float64
DRB%          18375 non-null float64
TRB%          18375 non-null float64
AST%          18375 non-null float64
STL%          18375 non-null float64
BLK%          18375 non-null float64
TOV%          18321 non-null float64
USG%          18375 non-null float64
blanl         0 non-null float64
OWS           18380 non-null float64
DWS           18380 non-null float64
WS            18380 non-null float64
WS/48         18375 non-null float64
blank2        0 non-null float64
OBPM          18380 non-null float64
DBPM          18380 non-null float64
BPM           18380 non-null float64
VORP          18380 non-null float64
FG            18380 non-null float64
FGA           18380 non-null float64
FG%           18295 non-null float64
3P            18380 non-null float64
3PA           18380 non-null float64
3P%           14969 non-null float64
2P            18380 non-null float64
2PA           18380 non-null float64
2P%           18266 non-null float64
eFG%          18295 non-null float64
FT            18380 non-null float64
FTA           18380 non-null float64
FT%           17657 non-null float64
ORB           18380 non-null float64
DRB           18380 non-null float64
TRB           18380 non-null float64
AST           18380 non-null float64
STL           18380 non-null float64
BLK           18380 non-null float64
TOV           18380 non-null float64
PF            18380 non-null float64
PTS           18380 non-null float64
dtypes: float64(49), int64(1), object(3)
memory usage: 7.6+ MB

Now, to start, lets just focus on a few categories

  • Player
  • games played (G)
  • minutes played (MP)
  • field goals, feild goal attempts, and percentage (FG, FGA)
  • free throws (FT, FTA), two-pointers (2P, 2PA), and three-pointers (3P, 3PA)
  • offensive, defensive, and total rebounds (ORB, DRB, TRB)
  • assists (AST)
  • steals (STL)
  • blocks (BLK)
  • turnovers (TOV)
  • personal fouls (PF)
  • points (PTS)

And of course our label: position. We could probably use any of the features as a label actually, and see if one could predict performance in one aspect of the game based on info in the another. But for now we'll stick with predicting position.

In [6]:
stats_to_keep = {'Player', 'Pos', 'G', 'MP', 'FG', 'FGA', 'FT', 'FTA',
                '2P', '2PA', '3P', '3PA', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK',
                'TOV', 'PF', 'PTS'}

stats_to_drop = set(stats.columns)-stats_to_keep
stats.drop(stats_to_drop, axis=1, inplace=True)
stats.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 18380 entries, 5727 to 24690
Data columns (total 21 columns):
Player    18380 non-null object
Pos       18380 non-null object
G         18380 non-null float64
MP        18380 non-null float64
FG        18380 non-null float64
FGA       18380 non-null float64
3P        18380 non-null float64
3PA       18380 non-null float64
2P        18380 non-null float64
2PA       18380 non-null float64
FT        18380 non-null float64
FTA       18380 non-null float64
ORB       18380 non-null float64
DRB       18380 non-null float64
TRB       18380 non-null float64
AST       18380 non-null float64
STL       18380 non-null float64
BLK       18380 non-null float64
TOV       18380 non-null float64
PF        18380 non-null float64
PTS       18380 non-null float64
dtypes: float64(19), object(2)
memory usage: 3.1+ MB

Okay. Finally, let's add the player description data to the stats dataframe.

In [7]:
stats['height'] = np.nan
stats['weight'] = np.nan

iplayer = players.set_index(keys='Player')
istats = stats.reset_index(drop=True)
for i, row in istats.iterrows():
    name = row[0]
    h = iplayer.loc[name].loc['height']
    w = iplayer.loc[name].loc['weight']
    
    # height and weight show up in the last two columns
    istats.iloc[i, len(istats.columns) - 2] = h
    istats.iloc[i, len(istats.columns) - 1] = w

stats = istats
In [8]:
# and now we don't need the names anymore
players = stats.Player
stats.drop(['Player'], axis=1, inplace=True)

Finally let's convert the position from a string to a number 1-5, and divide up the dataset into data and label (X and y). There are technically more than 5 listed (some are multiple positions listed,) but we'll just go off of the first-listed, primary position.

In [9]:
# first we need to separate out the label from the data
y = stats.Pos
set(y)
Out[9]:
{'C',
 'C-PF',
 'C-SF',
 'PF',
 'PF-C',
 'PF-SF',
 'PG',
 'PG-SF',
 'PG-SG',
 'SF',
 'SF-PF',
 'SF-SG',
 'SG',
 'SG-PF',
 'SG-PG',
 'SG-SF'}

lets reclassify this numerically and only count their 'primary ' position, so that each player will be given a position 1-5 {(1, pg), (2, sg), (3, sf), (4, pf), (5, c)}

In [10]:
import numpy as np
def convert_pos(y):
    newy = np.zeros((len(y), 1))
    for i, player in enumerate(y):
        if (player[0] == 'C'):
            newy[i] = 5
        elif (player[0:2] == 'PF'):
            newy[i] = 4
        elif (player[0:2] == 'SF'):
            newy[i] = 3
        elif (player[0:2] == 'SG'):
            newy[i] = 2
        elif (player[0:2] == 'PG'):
            newy[i] = 1
    return newy
In [11]:
y = convert_pos(y)
y
Out[11]:
array([[ 5.],
       [ 4.],
       [ 5.],
       ..., 
       [ 5.],
       [ 3.],
       [ 5.]])
In [12]:
# let's update it in the dataframe just for fun

stats['Pos'] = y
y = y.ravel()
In [13]:
from sklearn import preprocessing

# now we can drop the label, and we'll scale it to zero mean
X = stats.drop(['Pos'], axis=1)
ss = preprocessing.StandardScaler()
X = ss.fit_transform(X)
X
Out[13]:
array([[ 1.21172761,  2.14526944,  3.57664011, ...,  3.22124294,
         1.86568564,  0.4192247 ],
       [ 0.64977442,  0.06504059, -0.17905556, ..., -0.26000999,
         0.08122751,  0.17949879],
       [ 0.94948279,  1.08945314,  1.5390926 , ...,  1.31404935,
         0.60606814, -0.14013575],
       ..., 
       [-1.14847578, -1.14129722, -0.96654014, ..., -0.96583819,
         1.34084501,  0.8986765 ],
       [-0.21188713, -0.34537416, -0.5370031 , ..., -0.51402486,
         0.29116376,  0.01968152],
       [-0.4366684 , -0.59877007, -0.32774146, ..., -0.42241294,
         1.65574939,  1.85758012]])

Before we go any further let's see what some of the distributions are

In [14]:
np.bincount(y.astype(np.int32))
Out[14]:
array([   0, 3674, 3624, 3519, 3870, 3693], dtype=int64)
In [15]:
# first, how many of each class
import plotly
plotly.offline.init_notebook_mode() # run at the start of every notebook

graph1 = {'labels': ['PG', 'SG', 'SF', 'PF', 'C'],
          'values': np.bincount(y.astype(np.int32)-1), # -1 because it's looking for 0
            'type': 'pie'}
fig = dict()
fig['data'] = [graph1]
fig['layout'] = {'title': 'Total Class Distribution',
                'autosize':False,
                'width':500,
                'height':400}

plotly.offline.iplot(fig)

Of the five positions, each makes up about 20% of the dataset, what this means is that when we do our cross-validation, we don't need to worry about doing stratified splits.

Metrics Selection

Accuracy, precision, recall and F1-measure could all be one of the metrics used to consider to evalute our method. For the specific data set we have chosen, we decide to use the accuracy and F-measure as the evaluating metrics. Accuracy is the most common metrics used, even though it doesn't consider the cost of misclassifications in the data set, which can leads the results of accuracy socores be mostly pointless. However, we still think it's an appropriate metrics to be used for us since the class distribution in our data set is very even. Mostly each of the 5 classes covers 20% of the overall dataset.

F1-measure combines the measure of the precision and recall. It can help to fix the problem of accuracy that doesn't consider the misclassification cost, and overall improves our evaluation to the algorithm.

In order to better train and tune the hyper-parameters, it's helpful to split the original data set into training set, test set and validation set. It's a very large data set and the classes are well distributed. We would use the 20% of the whole data set to be the test set, 80% to be the train set, and 20% of the train set to be validation set.

In [16]:
from sklearn.metrics import make_scorer, precision_score, recall_score, f1_score, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

Multi Layer Preceptron

Let's see if we can do this. Below is our implementation of the MLP, largely based on the code from class, but with a few changes and additions. It's massive

In [17]:
# Example adapted from https://github.com/rasbt/python-machine-learning-book/blob/master/code/ch12/ch12.ipynb
# Original Author: Sebastian Raschka
# This is the optional book we use in the course, excellent intuitions and straightforward programming examples
# please note, however, that this code has been manipulated to reflect our assumptions and notation.
from scipy.special import expit
from sklearn.metrics import accuracy_score, f1_score
import sys

# start with a simple base classifier, which can't be fit or predicted
# it only has internal classes to be used by classes that will subclass it
class MultiLayerPerceptron(object):
    def __init__(self, n_hidden_neurons=(30,),
                 C=0.0, epochs=500, eta=0.001, random_state=None,
                 activation='sigmoid', cost_function='mse',
                 tol=1e-6):
        
        np.random.seed(random_state)
        
        self.n_hidden_layers = len(n_hidden_neurons)
        
        self.n_hidden_neurons = n_hidden_neurons
        
        self.C = C
        self.epochs = epochs
        self.eta = eta
        
        self.activation = activation
        self.cost_function = cost_function
        
        self.tol = tol
        
        print('layers:', self.n_hidden_layers, 'neurons per layer:', self.n_hidden_neurons)
    @staticmethod
    def _encode_labels(y):
        """Encode labels into one-hot representation"""
        onehot = pd.get_dummies(y).values   
        return onehot

    def _initialize_weights(self):
        """Initialize weights with small random numbers."""
        # there are n_layers W matrices
        W_matrices = []
        
        if (self.n_hidden_layers == 0):
            print('no hidden neurons')
            W_num_elems = (self.n_features_+1)*self.n_output_
            W = np.random.uniform(-1.0, 1.0, size=W_num_elems)
            W = W.reshape(self.n_features_+1, self.n_output_)
            W_matrices.append(W)
            
            return W_matrices
        
        # initial layers
        W1_num_elems = (self.n_features_ + 1)*self.n_hidden_neurons[0] # better give us atleast one
        W1 = np.random.uniform(-1.0, 1.0,size=W1_num_elems)
        W1 = W1.reshape(self.n_features_ + 1, self.n_hidden_neurons[0]) # (N+1 x S_0)
        
        W_matrices.append(W1)
        
        # hidden layers
        for S in range(1, self.n_hidden_layers):
            W_num_elems = (self.n_hidden_neurons[S-1] + 1)*(self.n_hidden_neurons[S])
            W = np.random.uniform(-1.0, 1.0, size=W_num_elems)
            W = W.reshape(self.n_hidden_neurons[S-1] + 1, self.n_hidden_neurons[S])
            W_matrices.append(W)
        
        # final layer
        Wf_num_elems = (self.n_hidden_neurons[-1] + 1) * self.n_output_
        Wf = np.random.uniform(-1.0, 1.0, size=Wf_num_elems)
        Wf = Wf.reshape(self.n_hidden_neurons[-1] + 1, self.n_output_)
        W_matrices.append(Wf)
        
        
        #print('W shapes')
        #for i, W in enumerate(W_matrices):
        #    print(i, W.shape)
        
        return W_matrices
    
    @staticmethod
    def _sigmoid(z):
        """Use scipy.special.expit to avoid overflow"""
        # 1.0 / (1.0 + np.exp(-z))
        return expit(z)
    
    @staticmethod
    def _add_bias_unit(X_train, how='column'):
        """Add bias unit (column or row of 1s) to array at index 0"""
        if how == 'column':
            ones = np.ones((X_train.shape[0], 1))
            X_new = np.hstack((ones, X_train))
        elif how == 'row':
            ones = np.ones((1, X_train.shape[1]))
            X_new = np.vstack((ones, X_train))
        return X_new
    
    @staticmethod
    def _L2_reg(lambda_, W_matrices):
    #def _L2_reg(lambda_, W1, W2):
        """Compute L2-regularization cost"""
        # only compute for non-bias terms
        s = 0
        for W in W_matrices:
            s += np.mean(W[1:,:]**2)
        
        return (lambda_/2.0) * np.sqrt(s)
    
    def _cost(self, A_final, Y_enc, W_matrices):
    #def _cost(self,A3,Y_enc,W1,W2):
        '''Get the objective function value'''
        cost = np.mean((Y_enc-A_final)**2)
        L2_term = self._L2_reg(self.C, W_matrices)
        return cost + L2_term
    
    #def _feedforward(self, X, W1, W2):
    def _feedforward(self, X_train, W_matrices):
        '''Compute feedforward step'''
        A_matrices = []
        Z_matrices = []
        
        # A1 is just the bias added data matrix X (M x N+1)
        A1 = self._add_bias_unit(X_train, how='column')
        A_matrices.append(A1)
        
        for S, W in enumerate(W_matrices):
            Z = A_matrices[S] @ W # ((M x N+1) x (N+1 x S_i)) = (M x S_i)
            Z_matrices.append(Z)

            # sigmoid or linear actiavtion
            if (self.activation == 'sigmoid'):
                A = self._sigmoid(Z)
            else:
                A = Z
                
            A = self._add_bias_unit(A, how='column') # (M x S_i+1)
            A_matrices.append(A)
            
        # remove the bias from the last A
        A_matrices[-1] = A_matrices[-1][:,1:]
        
        return A_matrices, Z_matrices
    
    
    def _get_gradient(self, A_matrices, Z_matrices, Y_enc, W_matrices):
    #def _get_gradient(self, A1, A2, A3, Z1, Z2, Y_enc, W1, W2):
        """ Compute gradient step using backpropagation.
        """
        
        sigma_matrices = []
        grad_matrices =[]

        # different cost functions
        if (self.cost_function == 'mse'):
            sigma_final = -2*(Y_enc-A_matrices[-1])
            
            if (self.activation == 'sigmoid'):
                sigma_final *= A_matrices[-1]*(1-A_matrices[-1])
            
        elif (self.cost_function == 'cross'):
            sigma_final = (A_matrices[-1] - Y_enc)
           
        
        sigma_matrices.append(sigma_final)
        
        # based on penultimate A
        grad_final = sigma_final.T @ A_matrices[-2]
        grad_final[:, 1:] += (W_matrices[-1]).T[:, 1:] * self.C
        
        grad_matrices.append(grad_final)
        
        # move backwards starting with second to last
        for i in range(1, len(W_matrices)):
            A2 = A_matrices[-(1+i)]
            A1 = A_matrices[-(2+i)]
            W2 = W_matrices[-i]
            W1 = W_matrices[-(1+i)]
            
            sigma = (sigma_matrices[i-1] @ W2.T)
            
            if (self.activation == 'sigmoid'):
                sigma *= A2*(1-A2)
            
            # if linear we're done
            
            sigma = sigma[:, 1:] # remove bias column
            sigma_matrices.append(sigma)
            
            grad = sigma.T @ A1
            grad[:, 1:] += (W1.T)[:, 1:] * self.C
            grad_matrices.append(grad)
        
        # flip'em back around
        grad_matrices.reverse()
        return grad_matrices
        
    def predict(self, X_train):
        """Predict class labels"""
        A, _ = self._feedforward(X_train, self.W_matrices)
        Afinal = A[-1] # because off by one for positions
        y_pred = np.argmax(Afinal, axis=1) + 1
        return y_pred
    
    def score(self, *args):
        yhat = self.predict(X_train)
        return accuracy_score(y_train, yhat)
    
    def f_score(self,*args):
        yhat = self.predict(X_train)
        return f1_score(y_train,yhat,average='micro')
    
    def conf_matrix(self, *args):
        yhat = self.predict(X_train)
        return confusion_matrix(y_train,yhat)
    
     # lifted straight out of sklearn source code with some modifications
    @classmethod
    def _get_param_names(cls):
        # this is just specific to this classifier
        return sorted(['eta', 'epochs', 'C', 'n_hidden_neurons'])

    def get_params(self, deep=True):
        """Get parameters for this estimator.
        Parameters
        ----------
        deep : boolean, optional
            If True, will return the parameters for this estimator and
            contained subobjects that are estimators.
        Returns
        -------
        params : mapping of string to any
            Parameter names mapped to their values.
        """
        out = dict()
        for key in self._get_param_names():
            # We need deprecation warnings to always be on in order to
            # catch deprecated param values.
            # This is set in utils/__init__.py but it gets overwritten
            # when running under python3 somehow.
            warnings.simplefilter("always", DeprecationWarning)
            try:
                with warnings.catch_warnings(record=True) as w:
                    value = getattr(self, key, None)
                if len(w) and w[0].category == DeprecationWarning:
                    # if the parameter is deprecated, don't show it
                    continue
            finally:
                warnings.filters.pop(0)

            # XXX: should we rather test if instance of estimator?
            if deep and hasattr(value, 'get_params'):
                deep_items = value.get_params().items()
                out.update((key + '__' + k, val) for k, val in deep_items)
            out[key] = value
        return out
     
    def set_params(self, **params):
        """Set the parameters of this estimator.
        The method works on simple estimators as well as on nested objects
        (such as pipelines). The latter have parameters of the form
        ``<component>__<parameter>`` so that it's possible to update each
        component of a nested object.
        Returns
        -------
        self
        """
        if not params:
            # Simple optimization to gain speed (inspect is slow)
            return self
        valid_params = self.get_params(deep=True)
        # changed from six.iteritems() bc no need for py2 vs py3 compatabillity
        for key, value in params.items():
            split = key.split('__', 1)
            if len(split) > 1:
                # nested objects case
                name, sub_name = split
                if name not in valid_params:
                    raise ValueError('Invalid parameter %s for estimator %s. '
                                     'Check the list of available parameters '
                                     'with `estimator.get_params().keys()`.' %
                                     (name, self))
                sub_object = valid_params[name]
                sub_object.set_params(**{sub_name: value})
            else:
                # simple objects case
                if key not in valid_params:
                    raise ValueError('Invalid parameter %s for estimator %s. '
                                     'Check the list of available parameters '
                                     'with `estimator.get_params().keys()`.' %
                                     (key, self.__class__.__name__))
                setattr(self, key, value)
        return self

    
    def fit(self, X, y, print_progress=False):
        """ Learn weights from training data."""
        
        X_data, y_data = X.copy(), y.copy()
        Y_enc = self._encode_labels(y)
        
        # init weights and setup matrices
        self.n_features_ = X_data.shape[1]
        self.n_output_ = Y_enc.shape[1]
        
        self.W_matrices = self._initialize_weights()
        
        self.cost_ = []
        self.score_ = []
        for i in range(self.epochs):

            if print_progress>0 and (i+1)%print_progress==0:
                sys.stderr.write('\rEpoch: %d/%d' % (i+1, self.epochs))
                sys.stderr.flush()
            
            # feedforward all instances
            A_matrices, Z_matrices = self._feedforward(X_data,self.W_matrices)

            # back prop
            grads = self._get_gradient(A_matrices, Z_matrices, Y_enc, self.W_matrices)
            for j, grad in enumerate(grads):
                self.W_matrices[j] -= self.eta * grad.T
            
            
            cost = self._cost(A_matrices[-1], Y_enc, self.W_matrices)
            self.cost_.append(cost)
            self.score_.append(self.score())
            
            # early stopping
            if i > 1 and abs(self.cost_[-1] - self.cost_[-2]) < self.tol:
                break
            
        self.echochs_actual_ = i + 1
        return self

Let's check to see how it does.

In [18]:
nn = MultiLayerPerceptron()
layers: 1 neurons per layer: (30,)
In [19]:
%%time
nn.fit(X_train, y_train, print_progress=1)
yhat = nn.predict(X_train)
Epoch: 500/500
Wall time: 16.6 s
In [20]:
print('Test acc: ', nn.score())
Test acc:  0.565628400435
In [21]:
plt.plot(range(len(nn.cost_)), nn.cost_, nn.score_)
plt.legend(['cost', 'score'])
plt.ylabel('Cost')
plt.xlabel('Epochs')
plt.tight_layout()
plt.show()

Hyper parameter tuning via GridSearch

In [22]:
from sklearn.model_selection import GridSearchCV
import warnings
parameters = {'epochs': [50, 100, 300],
              'eta': np.logspace(base=10, start=-5, stop=1, num=4),
              'C': np.logspace(base=10, start=-3, stop=1, num=4)
             }
nncv = MultiLayerPerceptron(cost_function='mse', tol=1e-3)
clf = GridSearchCV(nncv, parameters, verbose=3)
clf.fit(X_train, y_train)
layers: 1 neurons per layer: (30,)
Fitting 3 folds for each of 48 candidates, totalling 144 fits
layers: 1 neurons per layer: (30,)
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=1e-05 ...................................
[CV]  C=0.001, epochs=50, eta=1e-05, score=0.2615614798694233, total=   1.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=1e-05 ...................................
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.3s remaining:    0.0s
[CV]  C=0.001, epochs=50, eta=1e-05, score=0.34133569096844396, total=   1.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=1e-05 ...................................
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    2.6s remaining:    0.0s
[CV]  C=0.001, epochs=50, eta=1e-05, score=0.3977829162132753, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=0.001 ...................................
[CV]  C=0.001, epochs=50, eta=0.001, score=0.3909140369967356, total=   1.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=0.001 ...................................
[CV]  C=0.001, epochs=50, eta=0.001, score=0.2736670293797606, total=   0.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=0.001 ...................................
[CV]  C=0.001, epochs=50, eta=0.001, score=0.31413220892274213, total=   1.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=0.1 .....................................
[CV]  C=0.001, epochs=50, eta=0.1, score=0.15383569096844396, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=0.1 .....................................
[CV]  C=0.001, epochs=50, eta=0.1, score=0.19654515778019588, total=   1.4s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=0.1 .....................................
[CV]  C=0.001, epochs=50, eta=0.1, score=0.2005576713819369, total=   1.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=10.0 ....................................
[CV]  C=0.001, epochs=50, eta=10.0, score=0.2005576713819369, total=   2.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=10.0 ....................................
[CV]  C=0.001, epochs=50, eta=10.0, score=0.19742927094668117, total=   2.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=50, eta=10.0 ....................................
[CV]  C=0.001, epochs=50, eta=10.0, score=0.31902883569096846, total=   2.4s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=1e-05 ..................................
[CV]  C=0.001, epochs=100, eta=1e-05, score=0.397442872687704, total=   2.4s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=1e-05 ..................................
[CV]  C=0.001, epochs=100, eta=1e-05, score=0.39649075081610446, total=   2.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=1e-05 ..................................
[CV]  C=0.001, epochs=100, eta=1e-05, score=0.3116158868335147, total=   2.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=0.001 ..................................
[CV]  C=0.001, epochs=100, eta=0.001, score=0.5690288356909684, total=   2.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=0.001 ..................................
[CV]  C=0.001, epochs=100, eta=0.001, score=0.37180359085963005, total=   2.5s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=0.001 ..................................
[CV]  C=0.001, epochs=100, eta=0.001, score=0.5603917301414582, total=   2.7s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=0.1 ....................................
[CV]  C=0.001, epochs=100, eta=0.1, score=0.2005576713819369, total=   3.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=0.1 ....................................
[CV]  C=0.001, epochs=100, eta=0.1, score=0.22680903155603918, total=   2.5s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=0.1 ....................................
[CV]  C=0.001, epochs=100, eta=0.1, score=0.17947497279651795, total=   2.6s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=10.0 ...................................
[CV]  C=0.001, epochs=100, eta=10.0, score=0.2005576713819369, total=   4.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=10.0 ...................................
[CV]  C=0.001, epochs=100, eta=10.0, score=0.2005576713819369, total=   5.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=100, eta=10.0 ...................................
[CV]  C=0.001, epochs=100, eta=10.0, score=0.19654515778019588, total=   4.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=1e-05 ..................................
[CV]  C=0.001, epochs=300, eta=1e-05, score=0.4736126224156692, total=   6.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=1e-05 ..................................
[CV]  C=0.001, epochs=300, eta=1e-05, score=0.44491294885745375, total=   7.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=1e-05 ..................................
[CV]  C=0.001, epochs=300, eta=1e-05, score=0.49870783460282914, total=   7.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=0.001 ..................................
[CV]  C=0.001, epochs=300, eta=0.001, score=0.6972252448313384, total=   7.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=0.001 ..................................
[CV]  C=0.001, epochs=300, eta=0.001, score=0.6933487486398259, total=   6.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=0.001 ..................................
[CV]  C=0.001, epochs=300, eta=0.001, score=0.5873911860718172, total=   7.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=0.1 ....................................
[CV]  C=0.001, epochs=300, eta=0.1, score=0.19654515778019588, total=   7.7s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=0.1 ....................................
[CV]  C=0.001, epochs=300, eta=0.1, score=0.19069640914036998, total=   8.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=0.1 ....................................
[CV]  C=0.001, epochs=300, eta=0.1, score=0.20960282916213274, total=   7.9s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=10.0 ...................................
[CV]  C=0.001, epochs=300, eta=10.0, score=0.19090043525571274, total=  14.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=10.0 ...................................
[CV]  C=0.001, epochs=300, eta=10.0, score=0.19654515778019588, total=  13.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.001, epochs=300, eta=10.0 ...................................
[CV]  C=0.001, epochs=300, eta=10.0, score=0.2005576713819369, total=  13.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=1e-05 .........................
[CV]  C=0.0215443469003, epochs=50, eta=1e-05, score=0.24469532100108815, total=   1.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=1e-05 .........................
[CV]  C=0.0215443469003, epochs=50, eta=1e-05, score=0.327733949945593, total=   1.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=1e-05 .........................
[CV]  C=0.0215443469003, epochs=50, eta=1e-05, score=0.29440968443960824, total=   1.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=0.001 .........................
[CV]  C=0.0215443469003, epochs=50, eta=0.001, score=0.3471164309031556, total=   1.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=0.001 .........................
[CV]  C=0.0215443469003, epochs=50, eta=0.001, score=0.3509929270946681, total=   1.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=0.001 .........................
[CV]  C=0.0215443469003, epochs=50, eta=0.001, score=0.3109357997823721, total=   1.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=0.1 ...........................
[CV]  C=0.0215443469003, epochs=50, eta=0.1, score=0.20769858541893363, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=0.1 ...........................
[CV]  C=0.0215443469003, epochs=50, eta=0.1, score=0.19090043525571274, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=0.1 ...........................
[CV]  C=0.0215443469003, epochs=50, eta=0.1, score=0.20239390642002175, total=   1.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=10.0 ..........................
[CV]  C=0.0215443469003, epochs=50, eta=10.0, score=0.19654515778019588, total=   2.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=10.0 ..........................
[CV]  C=0.0215443469003, epochs=50, eta=10.0, score=0.20960282916213274, total=   2.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=50, eta=10.0 ..........................
[CV]  C=0.0215443469003, epochs=50, eta=10.0, score=0.19090043525571274, total=   1.9s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=1e-05 ........................
[CV]  C=0.0215443469003, epochs=100, eta=1e-05, score=0.3339227421109902, total=   2.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=1e-05 ........................
[CV]  C=0.0215443469003, epochs=100, eta=1e-05, score=0.334942872687704, total=   2.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=1e-05 ........................
[CV]  C=0.0215443469003, epochs=100, eta=1e-05, score=0.33807127312295976, total=   2.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=0.001 ........................
[CV]  C=0.0215443469003, epochs=100, eta=0.001, score=0.37255168661588683, total=   2.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=0.001 ........................
[CV]  C=0.0215443469003, epochs=100, eta=0.001, score=0.3177366702937976, total=   2.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=0.001 ........................
[CV]  C=0.0215443469003, epochs=100, eta=0.001, score=0.5348204570184983, total=   2.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=0.1 ..........................
[CV]  C=0.0215443469003, epochs=100, eta=0.1, score=0.17328618063112078, total=   2.6s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=0.1 ..........................
[CV]  C=0.0215443469003, epochs=100, eta=0.1, score=0.20960282916213274, total=   2.6s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=0.1 ..........................
[CV]  C=0.0215443469003, epochs=100, eta=0.1, score=0.19090043525571274, total=   2.4s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=10.0 .........................
[CV]  C=0.0215443469003, epochs=100, eta=10.0, score=0.2005576713819369, total=   2.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=10.0 .........................
[CV]  C=0.0215443469003, epochs=100, eta=10.0, score=0.19090043525571274, total=   3.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=100, eta=10.0 .........................
[CV]  C=0.0215443469003, epochs=100, eta=10.0, score=0.2005576713819369, total=   2.9s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=1e-05 ........................
[CV]  C=0.0215443469003, epochs=300, eta=1e-05, score=0.492519042437432, total=   6.7s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=1e-05 ........................
[CV]  C=0.0215443469003, epochs=300, eta=1e-05, score=0.5168661588683352, total=   7.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=1e-05 ........................
[CV]  C=0.0215443469003, epochs=300, eta=1e-05, score=0.4725924918389554, total=   7.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=0.001 ........................
[CV]  C=0.0215443469003, epochs=300, eta=0.001, score=0.6316648531011969, total=   6.9s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=0.001 ........................
[CV]  C=0.0215443469003, epochs=300, eta=0.001, score=0.44838139281828077, total=   7.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=0.001 ........................
[CV]  C=0.0215443469003, epochs=300, eta=0.001, score=0.6890642002176278, total=   7.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=0.1 ..........................
[CV]  C=0.0215443469003, epochs=300, eta=0.1, score=0.2005576713819369, total=   7.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=0.1 ..........................
[CV]  C=0.0215443469003, epochs=300, eta=0.1, score=0.1853917301414581, total=   7.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=0.1 ..........................
[CV]  C=0.0215443469003, epochs=300, eta=0.1, score=0.20239390642002175, total=   7.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=10.0 .........................
[CV]  C=0.0215443469003, epochs=300, eta=10.0, score=0.2005576713819369, total=   3.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=10.0 .........................
[CV]  C=0.0215443469003, epochs=300, eta=10.0, score=0.2005576713819369, total=   2.9s
layers: 1 neurons per layer: (30,)
[CV] C=0.0215443469003, epochs=300, eta=10.0 .........................
[CV]  C=0.0215443469003, epochs=300, eta=10.0, score=0.19090043525571274, total=   2.9s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=1e-05 ..........................
[CV]  C=0.464158883361, epochs=50, eta=1e-05, score=0.37452393906420023, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=1e-05 ..........................
[CV]  C=0.464158883361, epochs=50, eta=1e-05, score=0.3745919477693145, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=1e-05 ..........................
[CV]  C=0.464158883361, epochs=50, eta=1e-05, score=0.2196001088139282, total=   1.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=0.001 ..........................
[CV]  C=0.464158883361, epochs=50, eta=0.001, score=0.26081338411316646, total=   1.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=0.001 ..........................
[CV]  C=0.464158883361, epochs=50, eta=0.001, score=0.23197769314472252, total=   1.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=0.001 ..........................
[CV]  C=0.464158883361, epochs=50, eta=0.001, score=0.2729869423286181, total=   1.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=0.1 ............................
[CV]  C=0.464158883361, epochs=50, eta=0.1, score=0.2005576713819369, total=   1.5s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=0.1 ............................
[CV]  C=0.464158883361, epochs=50, eta=0.1, score=0.19899347116430904, total=   2.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=0.1 ............................
[CV]  C=0.464158883361, epochs=50, eta=0.1, score=0.2005576713819369, total=   1.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=10.0 ...........................
[CV]  C=0.464158883361, epochs=50, eta=10.0, score=0.19654515778019588, total=   3.5s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=10.0 ...........................
[CV]  C=0.464158883361, epochs=50, eta=10.0, score=0.2005576713819369, total=   3.4s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=50, eta=10.0 ...........................
[CV]  C=0.464158883361, epochs=50, eta=10.0, score=0.20960282916213274, total=   3.4s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=1e-05 .........................
[CV]  C=0.464158883361, epochs=100, eta=1e-05, score=0.1853917301414581, total=   3.4s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=1e-05 .........................
[CV]  C=0.464158883361, epochs=100, eta=1e-05, score=0.34840859630032645, total=   3.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=1e-05 .........................
[CV]  C=0.464158883361, epochs=100, eta=1e-05, score=0.3336507072905332, total=   3.7s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=0.001 .........................
[CV]  C=0.464158883361, epochs=100, eta=0.001, score=0.28488846572361265, total=   3.6s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=0.001 .........................
[CV]  C=0.464158883361, epochs=100, eta=0.001, score=0.24299510337323177, total=   3.6s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=0.001 .........................
[CV]  C=0.464158883361, epochs=100, eta=0.001, score=0.5231229597388466, total=   3.5s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=0.1 ...........................
[CV]  C=0.464158883361, epochs=100, eta=0.1, score=0.19654515778019588, total=   3.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=0.1 ...........................
[CV]  C=0.464158883361, epochs=100, eta=0.1, score=0.20960282916213274, total=   3.5s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=0.1 ...........................
[CV]  C=0.464158883361, epochs=100, eta=0.1, score=0.19654515778019588, total=   2.9s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=10.0 ..........................
[CV]  C=0.464158883361, epochs=100, eta=10.0, score=0.2005576713819369, total=   7.1s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=10.0 ..........................
[CV]  C=0.464158883361, epochs=100, eta=10.0, score=0.2005576713819369, total=   7.2s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=100, eta=10.0 ..........................
[CV]  C=0.464158883361, epochs=100, eta=10.0, score=0.2005576713819369, total=   5.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=1e-05 .........................
[CV]  C=0.464158883361, epochs=300, eta=1e-05, score=0.49319912948857453, total=   8.9s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=1e-05 .........................
[CV]  C=0.464158883361, epochs=300, eta=1e-05, score=0.5155739934711643, total=   8.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=1e-05 .........................
[CV]  C=0.464158883361, epochs=300, eta=1e-05, score=0.49931991294885747, total=   8.4s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=0.001 .........................
[CV]  C=0.464158883361, epochs=300, eta=0.001, score=0.6903563656147987, total=   7.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=0.001 .........................
[CV]  C=0.464158883361, epochs=300, eta=0.001, score=0.5675326441784548, total=   8.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=0.001 .........................
[CV]  C=0.464158883361, epochs=300, eta=0.001, score=0.6983133841131665, total=  10.7s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=0.1 ...........................
[CV]  C=0.464158883361, epochs=300, eta=0.1, score=0.20960282916213274, total=   8.8s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=0.1 ...........................
[CV]  C=0.464158883361, epochs=300, eta=0.1, score=0.19090043525571274, total=   9.3s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=0.1 ...........................
[CV]  C=0.464158883361, epochs=300, eta=0.1, score=0.20960282916213274, total=   8.0s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=10.0 ..........................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

[CV]  C=0.464158883361, epochs=300, eta=10.0, score=0.2005576713819369, total=  16.6s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=10.0 ..........................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

[CV]  C=0.464158883361, epochs=300, eta=10.0, score=0.2005576713819369, total=  15.7s
layers: 1 neurons per layer: (30,)
[CV] C=0.464158883361, epochs=300, eta=10.0 ..........................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

[CV]  C=0.464158883361, epochs=300, eta=10.0, score=0.19654515778019588, total=  16.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=1e-05 ....................................
[CV]  C=10.0, epochs=50, eta=1e-05, score=0.25374047878128403, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=1e-05 ....................................
[CV]  C=10.0, epochs=50, eta=1e-05, score=0.2643498367791077, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=1e-05 ....................................
[CV]  C=10.0, epochs=50, eta=1e-05, score=0.25666485310119697, total=   1.2s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=0.001 ....................................
[CV]  C=10.0, epochs=50, eta=0.001, score=0.4314472252448313, total=   1.4s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=0.001 ....................................
[CV]  C=10.0, epochs=50, eta=0.001, score=0.3300462459194777, total=   1.4s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=0.001 ....................................
[CV]  C=10.0, epochs=50, eta=0.001, score=0.4755848748639826, total=   1.3s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=0.1 ......................................
[CV]  C=10.0, epochs=50, eta=0.1, score=0.20239390642002175, total=   0.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=0.1 ......................................
[CV]  C=10.0, epochs=50, eta=0.1, score=0.19090043525571274, total=   1.3s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=0.1 ......................................
[CV]  C=10.0, epochs=50, eta=0.1, score=0.19654515778019588, total=   0.1s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=10.0 .....................................
[CV]  C=10.0, epochs=50, eta=10.0, score=0.2005576713819369, total=   2.4s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=10.0 .....................................
[CV]  C=10.0, epochs=50, eta=10.0, score=0.2005576713819369, total=   2.5s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=50, eta=10.0 .....................................
[CV]  C=10.0, epochs=50, eta=10.0, score=0.2005576713819369, total=   2.5s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=1e-05 ...................................
[CV]  C=10.0, epochs=100, eta=1e-05, score=0.42573449401523394, total=   2.6s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=1e-05 ...................................
[CV]  C=10.0, epochs=100, eta=1e-05, score=0.404379760609358, total=   3.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=1e-05 ...................................
[CV]  C=10.0, epochs=100, eta=1e-05, score=0.4443688792165397, total=   2.6s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=0.001 ...................................
[CV]  C=10.0, epochs=100, eta=0.001, score=0.5594396082698585, total=   3.2s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=0.001 ...................................
[CV]  C=10.0, epochs=100, eta=0.001, score=0.5544749727965179, total=   2.6s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=0.001 ...................................
[CV]  C=10.0, epochs=100, eta=0.001, score=0.5316920565832427, total=   2.6s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=0.1 .....................................
[CV]  C=10.0, epochs=100, eta=0.1, score=0.2005576713819369, total=   3.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=0.1 .....................................
[CV]  C=10.0, epochs=100, eta=0.1, score=0.2005576713819369, total=   0.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=0.1 .....................................
[CV]  C=10.0, epochs=100, eta=0.1, score=0.19654515778019588, total=   0.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=10.0 ....................................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

[CV]  C=10.0, epochs=100, eta=10.0, score=0.2005576713819369, total=   5.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=10.0 ....................................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

[CV]  C=10.0, epochs=100, eta=10.0, score=0.2005576713819369, total=   5.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=100, eta=10.0 ....................................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

[CV]  C=10.0, epochs=100, eta=10.0, score=0.2005576713819369, total=   5.4s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=1e-05 ...................................
[CV]  C=10.0, epochs=300, eta=1e-05, score=0.4978917301414581, total=   8.4s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=1e-05 ...................................
[CV]  C=10.0, epochs=300, eta=1e-05, score=0.4794613710554951, total=   8.4s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=1e-05 ...................................
[CV]  C=10.0, epochs=300, eta=1e-05, score=0.4838139281828074, total=   8.2s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=0.001 ...................................
[CV]  C=10.0, epochs=300, eta=0.001, score=0.6383297062023939, total=   8.4s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=0.001 ...................................
[CV]  C=10.0, epochs=300, eta=0.001, score=0.6499591947769314, total=   8.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=0.001 ...................................
[CV]  C=10.0, epochs=300, eta=0.001, score=0.6671653971708379, total=   8.6s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=0.1 .....................................
[CV]  C=10.0, epochs=300, eta=0.1, score=0.20960282916213274, total=   7.6s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=0.1 .....................................
[CV]  C=10.0, epochs=300, eta=0.1, score=0.2005576713819369, total=   0.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=0.1 .....................................
[CV]  C=10.0, epochs=300, eta=0.1, score=0.19654515778019588, total=   0.1s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=10.0 ....................................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:167: RuntimeWarning:

overflow encountered in multiply

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:320: RuntimeWarning:

overflow encountered in multiply

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:189: RuntimeWarning:

overflow encountered in multiply

[CV]  C=10.0, epochs=300, eta=10.0, score=0.2005576713819369, total=  12.1s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=10.0 ....................................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:167: RuntimeWarning:

overflow encountered in multiply

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:320: RuntimeWarning:

overflow encountered in multiply

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:189: RuntimeWarning:

overflow encountered in multiply

[CV]  C=10.0, epochs=300, eta=10.0, score=0.2005576713819369, total=  12.0s
layers: 1 neurons per layer: (30,)
[CV] C=10.0, epochs=300, eta=10.0 ....................................
C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:104: RuntimeWarning:

overflow encountered in square

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:328: RuntimeWarning:

invalid value encountered in double_scalars

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:167: RuntimeWarning:

overflow encountered in multiply

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:320: RuntimeWarning:

overflow encountered in multiply

C:\Users\leima\AppData\Local\conda\conda\envs\my_root\lib\site-packages\ipykernel_launcher.py:189: RuntimeWarning:

overflow encountered in multiply

[CV]  C=10.0, epochs=300, eta=10.0, score=0.2005576713819369, total=  11.5s
layers: 1 neurons per layer: (30,)
[Parallel(n_jobs=1)]: Done 144 out of 144 | elapsed: 10.7min finished
Out[22]:
GridSearchCV(cv=None, error_score='raise',
       estimator=<__main__.MultiLayerPerceptron object at 0x000001EAB015F6D8>,
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'epochs': [50, 100, 300], 'eta': array([  1.00000e-05,   1.00000e-03,   1.00000e-01,   1.00000e+01]), 'C': array([  1.00000e-03,   2.15443e-02,   4.64159e-01,   1.00000e+01])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=3)
In [23]:
cvres = pd.read_csv(os.path.join(data_path, 'hyper_param_gs.csv'))
#cvres = pd.DataFrame(clf.cv_results_)
#cvres.to_csv(os.path.join(data_path, 'hyper_param_gs.csv'))
cvres
Out[23]:
Unnamed: 0 mean_fit_time mean_score_time mean_test_score mean_train_score param_C param_epochs param_eta params rank_test_score split0_test_score split0_train_score split1_test_score split1_train_score split2_test_score split2_train_score std_fit_time std_score_time std_test_score std_train_score
0 0 1.955656 0.018216 0.311983 0.311988 0.001000 50 0.00001 {'C': 0.001, 'epochs': 50, 'eta': 1.0000000000... 20 0.192982 0.192982 0.347008 0.347008 0.395974 0.395974 0.027638 0.002020 0.086492 0.086492
1 1 2.015203 0.019230 0.400738 0.400744 0.001000 50 0.00100 {'C': 0.001, 'epochs': 50, 'eta': 0.001} 12 0.212568 0.212568 0.485745 0.485745 0.503917 0.503917 0.099934 0.002340 0.133268 0.133267
2 2 2.197491 0.020221 0.181611 0.181610 0.001000 50 0.10000 {'C': 0.001, 'epochs': 50, 'eta': 0.1000000000... 47 0.186616 0.186616 0.191458 0.191458 0.166757 0.166757 0.126363 0.002398 0.010687 0.010687
3 3 4.195354 0.045954 0.199891 0.199891 0.001000 50 10.00000 {'C': 0.001, 'epochs': 50, 'eta': 10.0} 31 0.199891 0.199891 0.199891 0.199891 0.199891 0.199891 0.409730 0.007573 0.000000 0.000000
4 4 4.268600 0.016720 0.365015 0.365016 0.001000 100 0.00001 {'C': 0.001, 'epochs': 100, 'eta': 1.000000000... 16 0.346137 0.346137 0.363330 0.363330 0.385582 0.385582 0.422524 0.000248 0.016147 0.016147
5 5 4.426663 0.028912 0.387034 0.387033 0.001000 100 0.00100 {'C': 0.001, 'epochs': 100, 'eta': 0.001} 15 0.472524 0.472524 0.316485 0.316485 0.372089 0.372089 0.157545 0.001251 0.064575 0.064573
6 6 4.811804 0.021558 0.174086 0.174084 0.001000 100 0.10000 {'C': 0.001, 'epochs': 100, 'eta': 0.100000000... 48 0.191458 0.191458 0.199891 0.199891 0.130903 0.130903 0.421360 0.003249 0.030726 0.030727
7 7 6.943836 0.034425 0.197425 0.197425 0.001000 100 10.00000 {'C': 0.001, 'epochs': 100, 'eta': 10.0} 42 0.191458 0.191458 0.200925 0.200925 0.199891 0.199891 0.073769 0.001316 0.004240 0.004240
8 8 11.633611 0.017045 0.516449 0.516449 0.001000 300 0.00001 {'C': 0.001, 'epochs': 300, 'eta': 1.000000000... 4 0.519097 0.519097 0.513058 0.513058 0.517193 0.517193 0.070758 0.000410 0.002521 0.002521
9 9 8.162368 0.018716 0.429049 0.429053 0.001000 300 0.00100 {'C': 0.001, 'epochs': 300, 'eta': 0.001} 8 0.392492 0.392492 0.379217 0.379217 0.515452 0.515452 5.162643 0.002100 0.061331 0.061333
10 10 12.679105 0.019551 0.196808 0.196808 0.001000 300 0.10000 {'C': 0.001, 'epochs': 300, 'eta': 0.100000000... 44 0.200925 0.200925 0.189554 0.189554 0.199946 0.199946 0.269841 0.001228 0.005145 0.005145
11 11 20.673112 0.035093 0.198984 0.198984 0.001000 300 10.00000 {'C': 0.001, 'epochs': 300, 'eta': 10.0} 37 0.199891 0.199891 0.197171 0.197171 0.199891 0.199891 0.275665 0.003569 0.001282 0.001282
12 12 1.958731 0.017221 0.336327 0.336326 0.021544 50 0.00001 {'C': 0.021544346900318832, 'epochs': 50, 'eta... 18 0.390207 0.390207 0.297715 0.297715 0.321055 0.321055 0.034462 0.000479 0.039274 0.039273
13 13 2.105833 0.019886 0.216430 0.216431 0.021544 50 0.00100 {'C': 0.021544346900318832, 'epochs': 50, 'eta... 24 0.201306 0.201306 0.210555 0.210555 0.237432 0.237432 0.113859 0.003668 0.015322 0.015323
14 14 2.216989 0.018550 0.198984 0.198984 0.021544 50 0.10000 {'C': 0.021544346900318832, 'epochs': 50, 'eta... 37 0.199891 0.199891 0.197171 0.197171 0.199891 0.199891 0.063193 0.000709 0.001282 0.001282
15 15 3.244340 0.034759 0.198984 0.198984 0.021544 50 10.00000 {'C': 0.021544346900318832, 'epochs': 50, 'eta... 37 0.197171 0.197171 0.199891 0.199891 0.199891 0.199891 0.068683 0.000945 0.001282 0.001282
16 16 3.905222 0.017379 0.417535 0.417537 0.021544 100 0.00001 {'C': 0.021544346900318832, 'epochs': 100, 'et... 10 0.395919 0.395919 0.406910 0.406910 0.449782 0.449782 0.007794 0.000852 0.023237 0.023238
17 17 4.217253 0.019217 0.280285 0.280287 0.021544 100 0.00100 {'C': 0.021544346900318832, 'epochs': 100, 'et... 22 0.325680 0.325680 0.210664 0.210664 0.304516 0.304516 0.352335 0.001845 0.049984 0.049983
18 18 5.005014 0.023392 0.194269 0.194269 0.021544 100 0.10000 {'C': 0.021544346900318832, 'epochs': 100, 'et... 45 0.199891 0.199891 0.191458 0.191458 0.191458 0.191458 0.593601 0.005789 0.003975 0.003975
19 19 5.299359 0.034259 0.203445 0.203446 0.021544 100 10.00000 {'C': 0.021544346900318832, 'epochs': 100, 'et... 26 0.199891 0.199891 0.199891 0.199891 0.210555 0.210555 0.292952 0.002399 0.005027 0.005027
20 20 11.876183 0.018892 0.510753 0.510754 0.021544 300 0.00001 {'C': 0.021544346900318832, 'epochs': 300, 'et... 5 0.502720 0.502720 0.499129 0.499129 0.530413 0.530413 0.064634 0.002268 0.013978 0.013978
21 21 11.949819 0.018716 0.510493 0.510482 0.021544 300 0.00100 {'C': 0.021544346900318832, 'epochs': 300, 'et... 6 0.537595 0.537595 0.669260 0.669260 0.324592 0.324592 0.068374 0.001316 0.142007 0.142010
22 22 12.995357 0.018215 0.204189 0.204189 0.021544 300 0.10000 {'C': 0.021544346900318832, 'epochs': 300, 'et... 25 0.210555 0.210555 0.191458 0.191458 0.210555 0.210555 0.272871 0.000236 0.009002 0.009002
23 23 5.609468 0.033421 0.198984 0.198984 0.021544 300 10.00000 {'C': 0.021544346900318832, 'epochs': 300, 'et... 37 0.197171 0.197171 0.199891 0.199891 0.199891 0.199891 0.078653 0.002059 0.001282 0.001282
24 24 1.978996 0.017881 0.325898 0.325898 0.464159 50 0.00001 {'C': 0.46415888336127775, 'epochs': 50, 'eta'... 19 0.336997 0.336997 0.318553 0.318553 0.322144 0.322144 0.010390 0.001182 0.007984 0.007984
25 25 1.987448 0.017725 0.272800 0.272797 0.464159 50 0.00100 {'C': 0.46415888336127775, 'epochs': 50, 'eta'... 23 0.398259 0.398259 0.209793 0.209793 0.210337 0.210337 0.012778 0.000641 0.088717 0.088716
26 26 2.053516 0.018716 0.200580 0.200580 0.464159 50 0.10000 {'C': 0.46415888336127775, 'epochs': 50, 'eta'... 30 0.200925 0.200925 0.199891 0.199891 0.200925 0.200925 0.053075 0.001705 0.000487 0.000487
27 27 3.533603 0.035929 0.198985 0.198984 0.464159 50 10.00000 {'C': 0.46415888336127775, 'epochs': 50, 'eta'... 36 0.199891 0.199891 0.199891 0.199891 0.197171 0.197171 0.011478 0.001549 0.001282 0.001282
28 28 4.072100 0.017380 0.423649 0.423649 0.464159 100 0.00001 {'C': 0.46415888336127775, 'epochs': 100, 'eta... 9 0.431447 0.431447 0.426714 0.426714 0.412786 0.412786 0.185187 0.000237 0.007921 0.007921
29 29 3.965784 0.018048 0.301665 0.301668 0.464159 100 0.00100 {'C': 0.46415888336127775, 'epochs': 100, 'eta... 21 0.233950 0.233950 0.313711 0.313711 0.357345 0.357345 0.005565 0.001083 0.051090 0.051090
30 30 4.063311 0.018215 0.201632 0.201632 0.464159 100 0.10000 {'C': 0.46415888336127775, 'epochs': 100, 'eta... 29 0.197171 0.197171 0.210555 0.210555 0.197171 0.197171 0.016725 0.002019 0.006309 0.006309
31 31 8.063534 0.038770 0.199891 0.199891 0.464159 100 10.00000 {'C': 0.46415888336127775, 'epochs': 100, 'eta... 31 0.199891 0.199891 0.199891 0.199891 0.199891 0.199891 0.638002 0.004140 0.000000 0.000000
32 32 13.589641 0.028414 0.535927 0.535927 0.464159 300 0.00001 {'C': 0.46415888336127775, 'epochs': 300, 'eta... 2 0.521110 0.521110 0.551578 0.551578 0.535092 0.535092 0.698140 0.005651 0.012453 0.012452
33 33 12.153470 0.017722 0.450931 0.450925 0.464159 300 0.00100 {'C': 0.46415888336127775, 'epochs': 300, 'eta... 7 0.685147 0.685147 0.328509 0.328509 0.339119 0.339119 0.195714 0.000249 0.165679 0.165677
34 34 12.098597 0.018717 0.202539 0.202539 0.464159 300 0.10000 {'C': 0.46415888336127775, 'epochs': 300, 'eta... 28 0.210555 0.210555 0.199891 0.199891 0.197171 0.197171 0.128994 0.001250 0.005776 0.005776
35 35 21.625251 0.040275 0.199891 0.199891 0.464159 300 10.00000 {'C': 0.46415888336127775, 'epochs': 300, 'eta... 31 0.199891 0.199891 0.199891 0.199891 0.199891 0.199891 0.103060 0.001182 0.000000 0.000000
36 36 2.007950 0.017045 0.351812 0.351814 10.000000 50 0.00001 {'C': 10.0, 'epochs': 50, 'eta': 1.00000000000... 17 0.323993 0.323993 0.357780 0.357780 0.373667 0.373667 0.014913 0.000410 0.020713 0.020713
37 37 2.108693 0.020732 0.390723 0.390715 10.000000 50 0.00100 {'C': 10.0, 'epochs': 50, 'eta': 0.001} 14 0.483569 0.483569 0.460990 0.460990 0.227584 0.227584 0.127167 0.001029 0.115715 0.115718
38 38 0.554848 0.018048 0.202884 0.202884 10.000000 50 0.10000 {'C': 10.0, 'epochs': 50, 'eta': 0.10000000000... 27 0.210555 0.210555 0.200925 0.200925 0.197171 0.197171 0.048906 0.001083 0.005637 0.005637
39 39 5.147771 0.073027 0.193344 0.193344 10.000000 50 10.00000 {'C': 10.0, 'epochs': 50, 'eta': 10.0} 46 0.180250 0.180250 0.199891 0.199891 0.199891 0.199891 1.216325 0.046280 0.009259 0.009259
40 40 7.831627 0.023395 0.397607 0.397606 10.000000 100 0.00001 {'C': 10.0, 'epochs': 100, 'eta': 1.0000000000... 13 0.408487 0.408487 0.401795 0.401795 0.382535 0.382535 3.201335 0.004338 0.011001 0.011001
41 41 4.013095 0.019061 0.404167 0.404171 10.000000 100 0.00100 {'C': 10.0, 'epochs': 100, 'eta': 0.001} 11 0.405767 0.405767 0.318553 0.318553 0.488194 0.488194 0.006358 0.001228 0.069264 0.069265
42 42 1.702913 0.018230 0.198422 0.198422 10.000000 100 0.10000 {'C': 10.0, 'epochs': 100, 'eta': 0.1000000000... 41 0.200925 0.200925 0.197171 0.197171 0.197171 0.197171 1.667391 0.001337 0.001770 0.001770
43 43 7.205254 0.035596 0.199891 0.199891 10.000000 100 10.00000 {'C': 10.0, 'epochs': 100, 'eta': 10.0} 31 0.199891 0.199891 0.199891 0.199891 0.199891 0.199891 0.020429 0.000412 0.000000 0.000000
44 44 14.733038 0.020052 0.529508 0.529507 10.000000 300 0.00001 {'C': 10.0, 'epochs': 300, 'eta': 1.0000000000... 3 0.524102 0.524102 0.553700 0.553700 0.510718 0.510718 1.440652 0.001228 0.017958 0.017958
45 45 13.537390 0.019051 0.654734 0.654733 10.000000 300 0.00100 {'C': 10.0, 'epochs': 300, 'eta': 0.001} 1 0.655658 0.655658 0.657508 0.657508 0.651034 0.651034 0.865686 0.002489 0.002723 0.002723
46 46 0.627052 0.022727 0.197425 0.197425 10.000000 300 0.10000 {'C': 10.0, 'epochs': 300, 'eta': 0.1000000000... 43 0.191458 0.191458 0.199891 0.199891 0.200925 0.200925 0.099466 0.006668 0.004240 0.004240
47 47 19.965304 0.028410 0.199891 0.199891 10.000000 300 10.00000 {'C': 10.0, 'epochs': 300, 'eta': 10.0} 31 0.199891 0.199891 0.199891 0.199891 0.199891 0.199891 0.534462 0.004415 0.000000 0.000000

Let's look at what the best performing parameters were.

In [24]:
clf.best_score_
Out[24]:
0.65932430428352706
In [25]:
clf.best_params_
Out[25]:
{'C': 0.001, 'epochs': 300, 'eta': 0.001}
In [26]:
nnn = clf.best_estimator_
In [27]:
plt.plot(range(len(nnn.cost_)), nnn.cost_, nnn.score_)
plt.ylabel('Cost/Score')
plt.legend(['cost', 'score'])
plt.xlabel('Epochs')
plt.tight_layout()
plt.show()

Let's see how the score varaies over these different parameters

In [28]:
batch300 = cvres[cvres.param_epochs == 300]

sns.pointplot(batch300.param_C.values, batch300.mean_test_score.values, hue=batch300.param_eta)
plt.title('Score vs. Regularization Coefficient C')
plt.xlabel('C')
plt.show()

Most of the time the regularization coefficent has little effect on the score, except in the case of the best eta, where it had mixed results, but in general, the larger the values of C performed better.

In [29]:
batch300 = cvres[cvres.param_epochs == 300]

sns.pointplot(batch300.param_eta.values, batch300.mean_test_score.values, hue=batch300.param_C)
plt.title('Score vs. Regularization Coefficient C')
plt.xlabel('eta')
plt.show()

Here we see that there is a sweet spot for the value of eta. As eta increases, the score initially increases and then drops of steeply. Once eta gets too large the performance plateuas regardless of the regularization.

Comparisson to SKLearn

In [30]:
from sklearn.neural_network import MLPClassifier

sknn = MLPClassifier(hidden_layer_sizes=(30,), 
                     activation='logistic',
                     max_iter=300,
                     learning_rate_init=0.001,
                     alpha=10)

%time sknn.fit(X_train, y_train)
yhat = sknn.predict(X_train)
print('Validation Acc:',accuracy_score(yhat,y_train))
Wall time: 1.21 s
Validation Acc: 0.544613710555

Confusion Matrix

In [31]:
nn = MultiLayerPerceptron(cost_function='mse', 
                   C=10.0, epochs=300, eta = 0.001,
                   tol = 1e-3)
nn.fit(X_train, y_train, print_progress=1)
Epoch: 6/300
layers: 1 neurons per layer: (30,)
Epoch: 96/300
Out[31]:
<__main__.MultiLayerPerceptron at 0x1eab029d668>
In [32]:
conf_matrix = nn.conf_matrix()
In [33]:
import matplotlib.pylab as pl

pl.figure()
tb = pl.table(cellText=conf_matrix, loc=(0,0), cellLoc='center')
tc = tb.properties()['child_artists']
for cell in tc: 
    cell.set_height(1/5)
    cell.set_width(1/5)

ax = pl.gca()
ax.set_xticks([])
ax.set_yticks([])
Out[33]:
[]

Exploring Deeper Networks

To better see the performance of the multilayer perceptron, we would like to see the performance with more than 1 hidden layer. In the following code, we experinment with the max 3 hidden layers. Due to the limitation of time and calculation capacity of our hardware, we could only take the number of neurons and layers as our variable, and keep the other hyper parameters constant. The best performed parameters of a single hidden layer we found earlier will be applied here. For the same reason, we choose 20, 30, and 50 as the resonable number of neurons to be tested here, and we would test the different permuations of the number up on the maximum 3 layers.

In [34]:
num_neu_list = []
acc_list = []
f1_list = []
In [35]:
import itertools
num_layers = [1,2,3]
num_neu = [20,30,50]
with open('names.csv', 'w') as csvfile:
    for element in itertools.product(num_neu):
            nn = MultiLayerPerceptron(cost_function='mse', 
                                   n_hidden_neurons=element, 
                                   C=10.0, epochs=300, eta = 0.001,
                                     tol = 1e-3)
            nn.fit(X_train, y_train, print_progress=1)
            yhat = nn.predict(X_train)
            print('Test acc: ', nn.score())
            print("f1 score:", nn.f_score())
            num_neu_list.append(element)
            acc_list.append(nn.score())
            f1_list.append(nn.f_score())
    for element in itertools.product(num_neu,num_neu):
            nn = MultiLayerPerceptron(cost_function='mse', 
                                   n_hidden_neurons=element, 
                                   C=10.0, epochs=300, eta = 0.001,
                                     tol = 1e-3)
            nn.fit(X_train, y_train, print_progress=1)
            yhat = nn.predict(X_train)
            print('Test acc: ', nn.score())
            print("f1 score:", nn.f_score())
            num_neu_list.append(element)
            acc_list.append(nn.score())
            f1_list.append(nn.f_score())
    for element in itertools.product(num_neu,num_neu,num_neu):
            nn = MultiLayerPerceptron(cost_function='mse', 
                                   n_hidden_neurons=element, 
                                   C=10.0, epochs=300, eta = 0.001,
                                     tol = 1e-3)
            nn.fit(X_train, y_train, print_progress=1)
            yhat = nn.predict(X_train)
            print('Test acc: ', nn.score())
            print("f1 score:", nn.f_score())
            num_neu_list.append(element)
            acc_list.append(nn.score())
            f1_list.append(nn.f_score())
Epoch: 6/300
layers: 1 neurons per layer: (20,)
Epoch: 5/30000
Test acc:  0.644586507073
f1 score: 0.644586507073
layers: 1 neurons per layer: (30,)
Epoch: 3/3000
Test acc:  0.352353101197
f1 score: 0.352353101197
layers: 1 neurons per layer: (50,)
Epoch: 4/30000
Test acc:  0.384589227421
f1 score: 0.384589227421
layers: 2 neurons per layer: (20, 20)
Epoch: 3/3000
Test acc:  0.386629488575
f1 score: 0.386629488575
layers: 2 neurons per layer: (20, 30)
Epoch: 2/30000
Test acc:  0.491634929271
f1 score: 0.491634929271
layers: 2 neurons per layer: (20, 50)
Epoch: 2/30000
Test acc:  0.200557671382
f1 score: 0.200557671382
layers: 2 neurons per layer: (30, 20)
Epoch: 3/30000
Test acc:  0.561411860718
f1 score: 0.561411860718
layers: 2 neurons per layer: (30, 30)
Epoch: 2/30000
Test acc:  0.524687159956
f1 score: 0.524687159956
layers: 2 neurons per layer: (30, 50)
Epoch: 2/30000
Test acc:  0.44640914037
f1 score: 0.44640914037
layers: 2 neurons per layer: (50, 20)
Epoch: 2/30000
Test acc:  0.573517410229
f1 score: 0.573517410229
layers: 2 neurons per layer: (50, 30)
Epoch: 1/30000
Test acc:  0.571817192601
f1 score: 0.571817192601
layers: 2 neurons per layer: (50, 50)
Epoch: 2/30000
Test acc:  0.482725788901
f1 score: 0.482725788901
layers: 3 neurons per layer: (20, 20, 20)
Epoch: 2/30000
Test acc:  0.295769858542
f1 score: 0.295769858542
layers: 3 neurons per layer: (20, 20, 30)
Epoch: 2/30000
Test acc:  0.391118063112
f1 score: 0.391118063112
layers: 3 neurons per layer: (20, 20, 50)
Epoch: 2/30000
Test acc:  0.20239390642
f1 score: 0.20239390642
layers: 3 neurons per layer: (20, 30, 20)
Epoch: 2/30000
Test acc:  0.190900435256
f1 score: 0.190900435256
layers: 3 neurons per layer: (20, 30, 30)
Epoch: 2/30000
Test acc:  0.390437976061
f1 score: 0.390437976061
layers: 3 neurons per layer: (20, 30, 50)
Epoch: 1/30000
Test acc:  0.20239390642
f1 score: 0.20239390642
layers: 3 neurons per layer: (20, 50, 20)
Epoch: 1/30000
Test acc:  0.523463003264
f1 score: 0.523463003264
layers: 3 neurons per layer: (20, 50, 30)
Epoch: 1/30000
Test acc:  0.209602829162
f1 score: 0.209602829162
layers: 3 neurons per layer: (20, 50, 50)
Epoch: 2/30000
Test acc:  0.25612078346
f1 score: 0.25612078346
layers: 3 neurons per layer: (30, 20, 20)
Epoch: 2/30000
Test acc:  0.207086507073
f1 score: 0.207086507073
layers: 3 neurons per layer: (30, 20, 30)
Epoch: 2/30000
Test acc:  0.209602829162
f1 score: 0.209602829162
layers: 3 neurons per layer: (30, 20, 50)
Epoch: 2/30000
Test acc:  0.190900435256
f1 score: 0.190900435256
layers: 3 neurons per layer: (30, 30, 20)
Epoch: 2/30000
Test acc:  0.209602829162
f1 score: 0.209602829162
layers: 3 neurons per layer: (30, 30, 30)
Epoch: 1/30000
Test acc:  0.200557671382
f1 score: 0.200557671382
layers: 3 neurons per layer: (30, 30, 50)
Epoch: 1/30000
Test acc:  0.209602829162
f1 score: 0.209602829162
layers: 3 neurons per layer: (30, 50, 20)
Epoch: 1/30000
Test acc:  0.445525027203
f1 score: 0.445525027203
layers: 3 neurons per layer: (30, 50, 30)
Epoch: 1/30000
Test acc:  0.200557671382
f1 score: 0.200557671382
layers: 3 neurons per layer: (30, 50, 50)
Epoch: 1/30000
Test acc:  0.186887921654
f1 score: 0.186887921654
layers: 3 neurons per layer: (50, 20, 20)
Epoch: 1/30000
Test acc:  0.391934167573
f1 score: 0.391934167573
layers: 3 neurons per layer: (50, 20, 30)
Epoch: 1/30000
Test acc:  0.209602829162
f1 score: 0.209602829162
layers: 3 neurons per layer: (50, 20, 50)
Epoch: 1/30000
Test acc:  0.209602829162
f1 score: 0.209602829162
layers: 3 neurons per layer: (50, 30, 20)
Epoch: 1/30000
Test acc:  0.463955386289
f1 score: 0.463955386289
layers: 3 neurons per layer: (50, 30, 30)
Epoch: 1/30000
Test acc:  0.359153971708
f1 score: 0.359153971708
layers: 3 neurons per layer: (50, 30, 50)
Epoch: 1/30000
Test acc:  0.225040805223
f1 score: 0.225040805223
layers: 3 neurons per layer: (50, 50, 20)
Epoch: 1/30000
Test acc:  0.534684439608
f1 score: 0.534684439608
layers: 3 neurons per layer: (50, 50, 30)
Epoch: 1/30000
Test acc:  0.20239390642
f1 score: 0.20239390642
layers: 3 neurons per layer: (50, 50, 50)
Epoch: 300/300
Test acc:  0.38112078346
f1 score: 0.38112078346

As we can notice from above, for some weird reasons, the accuracy score and f1 micro score we got here are identitacal for every cases. One of the possible reasons is that we are using the micro average option for the f1_score function from sk-learn. As they mentionted in their documentation, "Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while “weighted” averaging may produce an F-score that is not between precision and recall." However, it doesn't relate to the accuracy here. So far, we do not have a certain reason why they are identical, and we would explore further if we have more time. In the graph below, we would take f1 score as the main metric to evaluate the performance of each neuron network.

In [36]:
MLP_scores = pd.DataFrame()
MLP_scores['Num_neu'] = num_neu_list
MLP_scores['Accuracy'] = acc_list
MLP_scores["F1"] = f1_list
MLP_scores.head()
Out[36]:
Num_neu Accuracy F1
0 (20,) 0.644587 0.644587
1 (30,) 0.352353 0.352353
2 (50,) 0.384589 0.384589
3 (20, 20) 0.386629 0.386629
4 (20, 30) 0.491635 0.491635
In [37]:
import plotly
import plotly.graph_objs as go
plotly.offline.init_notebook_mode()
MLP_scores.Num_neu = MLP_scores.Num_neu.astype(str)
MLP_scores.dtypes
Out[37]:
Num_neu      object
Accuracy    float64
F1          float64
dtype: object
In [39]:
data = [go.Bar(
            x=MLP_scores["Num_neu"],
            y=MLP_scores["F1"],
            error_y=dict(
                type='data',
            ),
    )]
layout = go.Layout(
    title = "F1 Scores of Multi Layers"
    
    
)
fig = go.Figure(data = data, layout = layout)
plotly.offline.iplot(fig, filename='basic-bar')

According to the graph above, the network with a single layer and 20 neurons has the best performance which reaches more than 0.64 f1 scores. The second best is the one with 2 hidden layers of (50, 20) number of neurons and score of 0.5735, which does slightly better than the (50,30) with the score of 0.5718.

Reference

In [ ]: